Search Results

Documents authored by Lehmann, Hans-Peter


Document
High Performance Construction of RecSplit Based Minimal Perfect Hash Functions

Authors: Dominik Bez, Florian Kurpicz, Hans-Peter Lehmann, and Peter Sanders

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)


Abstract
A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX'20] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way. We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs. In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation.

Cite as

Dominik Bez, Florian Kurpicz, Hans-Peter Lehmann, and Peter Sanders. High Performance Construction of RecSplit Based Minimal Perfect Hash Functions. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 19:1-19:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{bez_et_al:LIPIcs.ESA.2023.19,
  author =	{Bez, Dominik and Kurpicz, Florian and Lehmann, Hans-Peter and Sanders, Peter},
  title =	{{High Performance Construction of RecSplit Based Minimal Perfect Hash Functions}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{19:1--19:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.19},
  URN =		{urn:nbn:de:0030-drops-186728},
  doi =		{10.4230/LIPIcs.ESA.2023.19},
  annote =	{Keywords: compressed data structure, parallel perfect hashing, bit parallelism, GPU, SIMD, parallel computing, vector instructions}
}
Document
Learned Monotone Minimal Perfect Hashing

Authors: Paolo Ferragina, Hans-Peter Lehmann, Peter Sanders, and Giorgio Vinciguerra

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)


Abstract
A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S of keys is a function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary value. Applications range from databases, search engines, data encryption, to pattern-matching algorithms. In this paper, we describe LeMonHash, a new technique for constructing MMPHFs for integers. The core idea of LeMonHash is surprisingly simple and effective: we learn a monotone mapping from keys to their rank via an error-bounded piecewise linear model (the PGM-index), and then we solve the collisions that might arise among keys mapping to the same rank estimate by associating small integers with them in a retrieval data structure (BuRR). On synthetic random datasets, LeMonHash needs 34% less space than the next larger competitor, while achieving about 16 times faster queries. On real-world datasets, the space usage is very close to or much better than the best competitors, while achieving up to 19 times faster queries than the next larger competitor. As far as the construction of LeMonHash is concerned, we get an improvement by a factor of up to 2, compared to the competitor with the next best space usage. We also investigate the case of keys being variable-length strings, introducing the so-called LeMonHash-VL: it needs space within 13% of the best competitors while achieving up to 3 times faster queries than the next larger competitor.

Cite as

Paolo Ferragina, Hans-Peter Lehmann, Peter Sanders, and Giorgio Vinciguerra. Learned Monotone Minimal Perfect Hashing. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 46:1-46:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{ferragina_et_al:LIPIcs.ESA.2023.46,
  author =	{Ferragina, Paolo and Lehmann, Hans-Peter and Sanders, Peter and Vinciguerra, Giorgio},
  title =	{{Learned Monotone Minimal Perfect Hashing}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{46:1--46:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.46},
  URN =		{urn:nbn:de:0030-drops-186990},
  doi =		{10.4230/LIPIcs.ESA.2023.46},
  annote =	{Keywords: compressed data structure, monotone minimal perfect hashing, retrieval}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail